24 research outputs found

    Development and application of consumer credit scoring models using profit-based classification measures

    No full text
    This paper presents a new approach for consumer credit scoring, by tailoring a profit-based classification performance measure to credit risk modeling. This performance measure takes into account the expected profits and losses of credit granting and thereby better aligns the model developers' objectives with those of the lending company. It is based on the Expected Maximum Profit (EMP) measure and is used to find a trade-off between the expected losses -- driven by the exposure of the loan and the loss given default -- and the operational income given by the loan. Additionally, one of the major advantages of using the proposed measure is that it permits to calculate the optimal cutoff value, which is necessary for model implementation. To test the proposed approach, we use a dataset of loans granted by a government institution, and benchmarked the accuracy and monetary gain of using EMP, accuracy, and the area under the ROC curve as measures for selecting model parameters, and for determining the respective cutoff values. The results show that our proposed profit-based classification measure outperforms the alternative approaches in terms of both accuracy and monetary value in the test set, and that it facilitates model deployment

    Development and application of consumer credit scoring models using profit-based classification measures

    No full text
    This paper presents a new approach for consumer credit scoring, by tailoring a profit-based classification performance measure to credit risk modeling. This performance measure takes into account the expected profits and losses of credit granting and thereby better aligns the model developers' objectives with those of the lending company. It is based on the Expected Maximum Profit (EMP) measure and is used to find a trade-off between the expected losses -- driven by the exposure of the loan and the loss given default -- and the operational income given by the loan. Additionally, one of the major advantages of using the proposed measure is that it permits to calculate the optimal cutoff value, which is necessary for model implementation. To test the proposed approach, we use a dataset of loans granted by a government institution, and benchmarked the accuracy and monetary gain of using EMP, accuracy, and the area under the ROC curve as measures for selecting model parameters, and for determining the respective cutoff values. The results show that our proposed profit-based classification measure outperforms the alternative approaches in terms of both accuracy and monetary value in the test set, and that it facilitates model deployment

    Business oriented data analytics: theory and case studies.

    No full text
    This PhD thesis focuses on predictive analytics in a business environment. Unlike explanatory modeling, which aims at gaining insight into structural dependencies between variables of interest, the objective of predictive analytics is to construct data-driven models that produce operationally accurate forecasts. Such a predictive analytics tool consists of two components, (1) data-driven models designed to predict future observations and (2) methods to assess the predictive power of such models. This dissertation focuses on a sub domain of predictive analytics: binary classification. Hence, two components are of interest: the classification models themselves, and the classification performance measures. We argue that profitability should be integrated into both components. Furthermore, we propose an approach which looks at benefits and costs, instead of misclassification costs alone.By focusing on benefits (and profit) rather than costs, we are staying closer to the business reality, and aid the adoption of classification techniques in the industry. Therefore, a profit-based classification performance measure is developed and applied to real life business cases. Moreover, an exploratory study on the incorporation of the profitability criterion into the model building step is presented. Finally, this PhD thesis discusses two case studies which clearly demonstrate the usefulness of data analytics in a business context.nrpages: 213status: publishe

    Profit-based feature selection using support vector machines - general framework and an application for customer retention

    No full text
    Churn prediction is an important application of classification models that identify those customers most likely to attrite based on their respective characteristics described by e.g. socio-demographic and behavioral variables. Since nowadays more and more of such features are captured and stored in the respective computational systems, an appropriate handling of the resulting information overload becomes a highly relevant issue when it comes to build customer retention systems based on churn prediction models. As a consequence, feature selection is an important step of the classifier construction process. Most feature selection techniques; however, are based on statistically inspired validation criteria, which not necessarily lead to models that optimize goals specified by the respective organization. In this paper we propose a profit-driven approach for classifier construction and simultaneous variable selection based on support vector machines. Experimental results show that our models outperform conventional techniques for feature selection achieving superior performance with respect to business-related goals

    A new knowledge-based constrained clustering approach: theory and application in direct marketing

    No full text
    Clustering has always been an exploratory but critical step in the knowledge discovery process. Often unsupervised, the clustering task received a huge interest when reinforced by different kinds of inputs provided by the user. This paper presents an approach giving the possibility to incorporate business knowledge in order to guide the clustering algorithm. A formalization of the fact that an intuitive a priori prioritization of the variables might exist, is presented in this paper and applied in a direct marketing context using recent data. By providing the analyst with a new approach offering different clustering perspectives, this paper proposes a straightforward way to apply constrained clustering with soft attribute-level constraints based on feature order preferences.publisher: Elsevier articletitle: A new knowledge-based constrained clustering approach: Theory and application in direct marketing journaltitle: Applied Soft Computing articlelink: http://dx.doi.org/10.1016/j.asoc.2014.06.002 content_type: article copyright: Copyright © 2014 Elsevier B.V. All rights reserved.status: publishe

    Towards comprehensible software fault prediction models using Bayesian network classifiers

    No full text
    Software testing is a crucial activity during software development and fault prediction models assist practitioners herein by providing an upfront identification of faulty software code by drawing upon the machine learning literature. While especially the Naive Bayes classifier is often applied in this regard, citing predictive performance and comprehensibility as its major strengths, a number of alternative Bayesian algorithms that boost the possibility to construct simpler networks with less nodes and arcs remain unexplored. This study contributes to the literature by considering 15 different Bayesian Network (BN) classifiers and comparing them to other popular machine learning techniques. Furthermore, the applicability of the Markov blanket principle for feature selection, which is a natural extension to BN theory, is investigated. The results, both in terms of the AUC and the recently introduced H-measure, are rigorously tested using the statistical framework of Demsar. It is concluded that simple and comprehensible networks with less nodes can be constructed using BN classifiers other than the Naive Bayes classifier. Furthermore, it is found that the aspects of comprehensibility and predictive performance need to be balanced out, and also the development context is an item which should be taken into account during model selection

    A novel profit maximizing metric for measuring classification performance of customer churn prediction models

    No full text
    The interest for data mining techniques has increased tremendously during the past decades, and numerous classification techniques have been applied in a wide range of business applications. Hence, the need for adequate performance measures has become more important than ever. In this paper, a cost benefit analysis framework is formalized in order to define performance measures which are aligned with the main objectives of the end users, i.e. profit maximization. A new performance measure is defined, the expected maximum profit criterion. This general framework is then applied to the customer churn problem with its particular cost benefit structure. The advantage of this approach is that it assists companies with selecting the classifier which maximizes the profit. Moreover, it aids with the practical implementation in the sense that it provides guidance about the fraction of the customer base to be included in the retention campaign

    A new SOM-based method for profile generation: theory and an application in direct marketing

    No full text
    The field of direct marketing is constantly searching for new data mining techniques in order to analyze the increasing available amount of data. Self-organizing maps (SOM) have been widely applied and discussed in the literature, since they give the possibility to reduce the complexity of a high dimensional attribute space while providing a powerful visual exploration facility. Combined with clustering techniques and the extraction of the so called salient dimensions, it is possible for a direct marketer to gain a high level insight about a dataset of prospects. In this paper, a SOM-based profile generator is presented, consisting of a generic method leading to value adding and business-oriented profiles for targeting individuals with predefined characteristics. Moreover, the proposed method is applied in detail to a concrete case study from the concert industry. The performance of the method is then illustrated and discussed and possible future research tracks are outlined

    Development and application of consumer credit scoring models using profit-based classification measures

    No full text
    This paper presents a new approach for consumer credit scoring, by tailoring a profit-based classification performance measure to credit risk modeling. This performance measure takes into account the expected profits and losses of credit granting and thereby better aligns the model developers’ objectives with those of the lending company. It is based on the Expected Maximum Profit (EMP) measure and is used to find a trade-off between the expected losses – driven by the exposure of the loan and the loss given default – and the operational income given by the loan. Additionally, one of the major advantages of using the proposed measure is that it permits to calculate the optimal cutoff value, which is necessary for model implementation. To test the proposed approach, we use a dataset of loans granted by a government institution, and benchmarked the accuracy and monetary gain of using EMP, accuracy, and the area under the ROC curve as measures for selecting model parameters, and for determining the respective cutoff values. The results show that our proposed profit-based classification measure outperforms the alternative approaches in terms of both accuracy and monetary value in the test set, and that it facilitates model deployment.publisher: Elsevier articletitle: Development and application of consumer credit scoring models using profit-based classification measures journaltitle: European Journal of Operational Research articlelink: http://dx.doi.org/10.1016/j.ejor.2014.04.001 content_type: article copyright: Copyright © 2014 Elsevier B.V. All rights reserved.status: publishe
    corecore